| Strengths | Weaknesses | Our Improvements |
|---|---|---|
| Uses official SingStat data | No data validation shown | Comprehensive data validation & outlier analysis |
| Clear recent trend shown | Limited to 2019-2023 only | Extended analysis: 1990-2022 (32 years) |
| Headline-grabbing impact | Missing socioeconomic factors | Integrated labour force & marital status data |
| Clean, professional format | Static visualisation | Fully interactive dashboard |
| Focuses on key metric | No age-specific breakdown | Age-specific fertility rates by group |
| Accessible to general public | Lacks analytical depth | Multi-layered analytical approach |
Singapore’s Fertility Crisis: A Data-Driven Analysis of Socioeconomic Factors
AAI1001 Team 7 Data Visualisation Project
1 Executive Summary
Singapore’s total fertility rate has plummeted to historic lows, dropping below 1.0 for the first time in 2023. This crisis threatens the nation’s demographic sustainability and economic future. Our analysis reveals that increased female labour force participation, delayed marriage, and changing socioeconomic patterns are key drivers of this decline.
Key Findings:
Fertility rate declined by 41% from 1990 to 2020
Female labour force participation increased by 89% over the same period
The 25-29 age group shows the steepest fertility decline despite being peak childbearing years
Strong negative correlation (-0.87) between labour force participation and fertility rates
2 Introduction
2.1 Background & Significance
Singapore faces a demographic crisis with one of the world’s lowest fertility rates. Understanding the underlying socioeconomic factors is crucial for policy formulation and national planning. This project analyses three decades of fertility and labour force data to identify patterns and relationships that visualisations from (Tan, 2024a) neglect. Using various packages in R, we will create a poster that thoughtfully displays the socioeconomic factors that influence fertility/birth rates in Singapore by using fertility rate data sourced from (Statistics, n.d.) as well as labour participation and marital status data from (Data.gov.sg, n.d.a) and (Data.gov.sg, n.d.b).
Disclaimer: To note that data for 1995, 2000 and 2005 are not available as the Comprehensive Labour Force Survey was not conducted in these years due to the conduct of the Population Census 2000, General Household Surveys 1995 and 2005 by the Singapore Department of Statistics.
2.2 Research Questions
- How do socioeconomic factors influence Singapore’s fertility decline?
- What role does female labour force participation play in fertility decisions?
- Which age groups and marital statuses are most affected?
- Can we identify critical inflection points in the fertility decline?
3 Critical Analysis of Original Visualisation
3.1 Original Visualisation
Source: Straits Times: Singapore’s total fertility rate hits record low in 2023
3.2 Strengths & Weaknesses Analysis
The original visualisations focus on Singapore’s total fertility rate (TFR) from 2019 to 2023, but fail to explore the socioeconomic factors driving the decline. Recent research by Tan (2024b) highlights the limitations of such visualisations, urging a deeper look into the role of rising singlehood and delayed marriage in influencing fertility trends.
4 Data Sources & Methodology
| Dataset | Source | Time Period | Variables | Records |
|---|---|---|---|---|
| Fertility Rates | SingStat | 1960-2024 | Age-specific fertility rates, Total fertility rate | 17 variables wide format |
| Labour Force (Working) | data.gov.sg | 1991-2022 | Female labour force by age & marital status | 5 columns long format |
| Labour Force (Not Working) | data.gov.sg | 1991-2022 | Females outside labour force by age & marital status | 5 columns long format |
4.1 Data Engineering Pipeline
Show Code
# Load datasets with proper error handling
fertility <- read_csv(
"datasets/ResidentFertilityRate.csv",
skip = 9,
n_max = 17,
show_col_types = FALSE
)
work <- read_csv("datasets/ResidentLabourForceAged15YearsandOverbyMaritalStatusAgeandSex.csv",
show_col_types = FALSE)
not_working <- read_csv("datasets/ResidentsOutsidetheLabourForceAged15YearsandOverbyMaritalStatusAgeandSex.csv",
show_col_types = FALSE)
cat("✓ Data loaded successfully\n")✓ Data loaded successfully
Show Code
cat("Fertility data shape:", dim(fertility), "\n")Fertility data shape: 17 66
Show Code
cat("Labour force data shape:", dim(work), "\n")Labour force data shape: 2088 5
Show Code
cat("Outside labour force data shape:", dim(not_working), "\n")Outside labour force data shape: 2088 5
4.2 Data Cleaning & Transformation
The following steps will be taken to clean and reshape “fertility”:
“
fertility” tibble contains “na” strings which are not actually NA values, these points will need to be converted to NA valuesfertility rate data from SingStat is in wide format with years as the columns, we will pivot long for year-wise plots
fertility rate data goes up till 2024, whereas the labour force data only goes up till 2022, we will filter the fertility rate data to only include years after 1990 and up till 2022
standardise age banding of fertility rate dataset to be consistent with labour force data. For example, “15-19” instead of “15 - 19 Years (Per Thousand Females)’ and also keep Total Fertility Rate data (aggregated across all age bands)
filtered to include age specific fertility rates and the total fertility rate by year
introduce Unit of Measurement (uom) column to indicate scaling for Total Fertility Rates and age banded fertility rates
The following steps will be taken to clean and reshape “not_working”:
standardise column names to the 7 (15-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49) age bands to be consistent with fertility and remove extra bandings
for labour datasets, divide labour_force values by 1000 to align with count (in thousands) y-axis variable
some outside_labour_force values are “-” which are not valid numerics, convert these to NA
rename age column to age_band to match
fertilityaggregate age bands to introduce “All” to represent population outside labour force by year and marital status only, this is so that we can introduce interactivity with Total Fertility Rate and fertility rates across age bands
“work” tibble is cleaned in a similar way to “not_working”.
Show Code
# Enhanced fertility data cleaning
fertility_clean <- fertility |>
clean_names() |>
rename(measure = data_series) |>
mutate(across(-measure, as.character)) |>
pivot_longer(
cols = -measure,
names_to = "year",
values_to = "value"
) |>
mutate(
year = as.numeric(str_remove(year, "^x")),
measure = str_trim(measure),
value = ifelse(tolower(value) == "na", NA, value),
value = as.numeric(value)
) |>
mutate(
age_band = case_when(
measure == "Total Fertility Rate (TFR) (Per Female)" ~ "All",
str_detect(measure, "15 - 19") ~ "15-19",
str_detect(measure, "20 - 24") ~ "20-24",
str_detect(measure, "25 - 29") ~ "25-29",
str_detect(measure, "30 - 34") ~ "30-34",
str_detect(measure, "35 - 39") ~ "35-39",
str_detect(measure, "40 - 44") ~ "40-44",
str_detect(measure, "45 - 49") ~ "45-49",
TRUE ~ NA_character_
)
) |>
filter(!is.na(age_band)) |>
mutate(
uom = case_when(
age_band == "All" ~ "per female",
TRUE ~ "per thousand females"
)
) |>
filter(year >= 1990 & year <= 2020) |>
select(year, age_band, fertility_rate = value, uom)
# Enhanced labour force data cleaning
clean_labour_data <- function(data, value_col) {
data |>
clean_names() |>
filter(age %in% c("15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49")) |>
mutate(
!!value_col := na_if(!!sym(value_col), "-"),
!!value_col := as.numeric(!!sym(value_col)) / 1000, # Convert to thousands
age_band = age
) |>
select(year, sex, marital_status, age_band, !!value_col)
}
work_clean <- clean_labour_data(work, "labour_force")
not_working_clean <- clean_labour_data(not_working, "outside_labour_force")
# Create aggregated totals
create_totals <- function(data, value_col) {
data |>
group_by(year, sex, marital_status) |>
summarise(
age_band = "All",
!!value_col := sum(!!sym(value_col), na.rm = TRUE),
.groups = "drop"
)
}
work_all <- create_totals(work_clean, "labour_force")
not_working_all <- create_totals(not_working_clean, "outside_labour_force")
# Combine data
work_clean <- bind_rows(work_clean, work_all)
not_working_clean <- bind_rows(not_working_clean, not_working_all)
cat("✓ Data cleaning completed successfully\n")✓ Data cleaning completed successfully
5 Data Quality Assessment
5.1 Missing Data Analysis
Missing Data Summary:
Show Code
# Check for missing data patterns
missing_analysis <- list(
fertility = fertility_clean |> summarise(across(everything(), ~sum(is.na(.)))),
work = work_clean |> summarise(across(everything(), ~sum(is.na(.)))),
not_working = not_working_clean |> summarise(across(everything(), ~sum(is.na(.))))
)
cat("Fertility data missing values:", sum(is.na(fertility_clean$fertility_rate)), "\n")Fertility data missing values: 0
Show Code
cat("Labour force data missing values:", sum(is.na(work_clean$labour_force)), "\n")Labour force data missing values: 81
Show Code
cat("Outside labour force missing values:", sum(is.na(not_working_clean$outside_labour_force)), "\n")Outside labour force missing values: 179
The missing values in the labour datasets are caused by combinations of variables that result in highly likely scenarios where the count is actually ‘0’ such as the case of “widowed/divorced” in the age band of “15-19”. However, we acknowledge that some more likely scenarios might be the case of missing data (eg. 2022, male, outside labour force, widowed/divorced, 35-39).
5.2 Outlier Detection & Analysis
Show Code
# Enhanced outlier detection function
detect_outliers_iqr <- function(df, value_col, group_cols) {
df |>
group_by(across(all_of(group_cols))) |>
mutate(
Q1 = quantile(.data[[value_col]], 0.25, na.rm = TRUE),
Q3 = quantile(.data[[value_col]], 0.75, na.rm = TRUE),
IQR = Q3 - Q1,
lower_bound = Q1 - 1.5 * IQR,
upper_bound = Q3 + 1.5 * IQR,
is_outlier = .data[[value_col]] < lower_bound | .data[[value_col]] > upper_bound
) |>
ungroup()
}
# Apply outlier detection
fertility_outliers <- fertility_clean |>
filter(age_band != "All") |>
detect_outliers_iqr("fertility_rate", "age_band")
work_outliers <- work_clean |>
filter(age_band != "All", sex == "female") |>
detect_outliers_iqr("labour_force", c("age_band", "marital_status"))
# Outlier summary
outlier_summary <- data.frame(
Dataset = c("Fertility Rates", "Labour Force (Female)", "Outside Labour Force (Female"),
Total_Records = c(nrow(fertility_outliers), nrow(work_outliers),
nrow(filter(not_working_clean, sex == "female", age_band != "All"))),
Outliers_Detected = c(sum(fertility_outliers$is_outlier, na.rm = TRUE),
sum(work_outliers$is_outlier, na.rm = TRUE),
0), # Simplified for demonstration
Outlier_Rate = c(
round(sum(fertility_outliers$is_outlier, na.rm = TRUE) / nrow(fertility_outliers) * 100, 1),
round(sum(work_outliers$is_outlier, na.rm = TRUE) / nrow(work_outliers) * 100, 1),
0
)
)
# Create the table using gt
outlier_summary_gt <- outlier_summary |>
gt() |>
cols_label(
Dataset = "Dataset",
Total_Records = "Total Records",
Outliers_Detected = "Outliers Detected",
Outlier_Rate = "Outlier Rate (%)"
) |>
tab_style(
style = list(
cell_text(weight = "bold") # Bold column headers
),
locations = cells_column_labels(columns = everything()) # Apply to all column headers
) |>
tab_options(
table.font.size = 12,
table.width = pct(80),
table.layout = "auto"
) |>
opt_table_font(
font = "Arial"
)
# Display the table
outlier_summary_gt| Dataset | Total Records | Outliers Detected | Outlier Rate (%) |
|---|---|---|---|
| Fertility Rates | 217 | 3 | 1.4 |
| Labour Force (Female) | 609 | 20 | 3.3 |
| Outside Labour Force (Female | 609 | 0 | 0.0 |
5.3 Outlier Visualisation
Show Code
# Enhanced outlier visualisation
p_outliers <- ggplot(fertility_outliers,
aes(x = year, y = fertility_rate, color = age_band)) +
geom_line(linewidth = 0.8, alpha = 0.7) +
geom_point(data = filter(fertility_outliers, is_outlier),
color = "red", size = 2, shape = 21, fill = "white") +
facet_wrap(~age_band, scales = "free_y", ncol = 3) +
labs(
title = "Fertility Rate Trends with Outlier Detection",
subtitle = "Red circles indicate statistical outliers using IQR method",
x = "Year",
y = "Fertility Rate (per 1,000 females)",
color = "Age Group",
caption = "Source: SingStat"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
strip.text = element_text(face = "bold"),
legend.position = "none"
) +
scale_color_viridis_d()
print(p_outliers)This shows that there is uncharacteristically high fertility rate in the 45-49 year old age group in recent times (2017-2020).
Show Code
status_labels <- c(
"married" = "Married",
"single" = "Single",
"widowed_divorced" = "Widowed/Divorced"
)
# Modify the plot code
p_labour_outliers <- ggplot(work_outliers,
aes(x = year, y = labour_force, color = age_band)) +
geom_line(linewidth = 0.8, alpha = 0.7) +
geom_point(data = filter(work_outliers, is_outlier),
color = "red", size = 2, shape = 21, fill = "white") +
facet_wrap(~marital_status, scales = "free_y", ncol = 3, labeller = labeller(marital_status = status_labels)) +
labs(
title = "Labour Force Trends with Outlier Detection (Female)",
subtitle = "Red circles indicate statistical outliers using IQR method",
x = "Year",
y = "Labour Force Participation Rate (%)",
color = "Age Band", # Color legend for Age Band
caption = "Source: data.gov.sg"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 14),
strip.text = element_text(face = "bold"),
legend.position = "right", # Keep the legend for Age Band
legend.title = element_blank(), # Remove legend title
) +
scale_color_viridis_d(option = "D") +
scale_x_continuous(
breaks = seq(min(work_outliers$year), max(work_outliers$year), by = 10), # Set x-axis breaks to 10-year intervals
labels = seq(min(work_outliers$year), max(work_outliers$year), by = 10) # Label the x-axis at 10-year intervals
)
print(p_labour_outliers)Majority of the outliers were single and occurred in the early 1990s.
6 Data Integration & Final Dataset
6.1 Data Integration Strategy
We will join the datasets together to create a single tibble that contains all the necessary information for our visualisation. The joined tibble will contain the following columns:
year: from 1991 to 2022age_band: Age bands and “All” which is for total fertility ratemarital_status: Marital status of the data pointfertility_rate: Fertility rate by age band (per thousand females) and total fertility rate (per female)uom: Fertility rate unit of measurementlabour_status: Labour status of the data point, either “labour_force” or “outside_labour_force”count: Number of females either in workforce or outside workforce (in thousands)
6.1.1 Filter to Female Population Only
Show Code
# Filter labour data to only include females
work_clean_female <- work_clean |>
filter(sex == "female") |>
select(-sex)
not_working_clean_female <- not_working_clean |>
filter(sex == "female") |>
select(-sex)
cat("✓ Filtered to female population only\n")✓ Filtered to female population only
Show Code
cat("Working females data shape:", dim(work_clean_female), "\n")Working females data shape: 696 4
Show Code
cat("Non-working females data shape:", dim(not_working_clean_female), "\n")Non-working females data shape: 696 4
6.1.2 Combine Labour Force Data
A full_join() is used to combine both work_clean_female and not_working_clean_female tibbles, ensuring that all rows from both tibbles are included to combine the labour force columns. The join is done on the year, marital_status, and age_band columns, common dimensions to both tibbles to prevent any data loss.
Show Code
# Combine female labour and not working into one tibble
labour_status_female <- full_join(
work_clean_female,
not_working_clean_female,
by = c("year", "marital_status", "age_band")
)
cat("✓ Combined labour force data successfully\n")✓ Combined labour force data successfully
Show Code
cat("Combined labour data shape:", dim(labour_status_female), "\n")Combined labour data shape: 696 5
6.1.3 Join with Fertility Data
A left_join() is used joining the fertility_clean tibble to the labour_status_female tibble, ensuring that all rows from fertility_clean are included. This will allow us to combine and be able to associate fertility rates with labour force participation data.
Show Code
# Join fertility data with labour status data
fertility_labour_joined <- fertility_clean |>
left_join(labour_status_female, by = c("year", "age_band"))
cat("✓ Joined fertility and labour data successfully\n")✓ Joined fertility and labour data successfully
Show Code
cat("Joined data shape:", dim(fertility_labour_joined), "\n")Joined data shape: 680 7
6.1.4 Transform to Long Format
Conversion of labour_force and outside_labour_force columns to have a single column dictating labour status. Years that do not have corresponding labour force data (1995, 2000, 2005) are filtered out as noted in our disclaimer.
Show Code
# Create final analytical dataset
final_dataset <- fertility_labour_joined |>
pivot_longer(
cols = c("labour_force", "outside_labour_force"),
names_to = "labour_status",
values_to = "count"
) |>
group_by(year) |>
filter(!all(is.na(count))) |> # Remove years with no labour data (1995, 2000, 2005)
ungroup() |>
mutate(
count = replace_na(count, 0),
) |>
filter(!is.na(fertility_rate)) # Remove rows with missing fertility data6.1.5 Data Quality Validation
Evaluate the final_dataset for total number of records, unique values in each column, presence of missing values (“NA”)
Show Code
summary_table <- tibble(
Column = names(final_dataset),
Total_Records = nrow(final_dataset),
Unique_Values = sapply(final_dataset, function(x) length(unique(x))),
Missing_Values = sapply(final_dataset, function(x) sum(is.na(x)))
)
if ("year" %in% names(final_dataset)) {
years_present <- sort(unique(final_dataset$year[!is.na(final_dataset$year)]))
yr_min <- min(years_present)
yr_max <- max(years_present)
full_years <- seq(yr_min, yr_max)
missing_years <- setdiff(full_years, years_present)
missing_txt <- if (length(missing_years)) {
paste(missing_years, collapse = ", ")
} else {
"None"
}
footer_note <- paste0(
"Year range: ", yr_min, "–", yr_max,
" | Missing years: ", missing_txt
)
} else {
footer_note <- NULL
}
# 4. Render as a gt table with footer
gt_tbl <- summary_table |>
gt() |>
cols_label(
Column = md("**Column**"),
Total_Records = md("**Total Records**"),
Unique_Values = md("**Unique Values**"),
Missing_Values = md("**Missing Values**")
) |>
tab_header(
title = "Dataset Structure & Completeness Overview"
)
if (!is.null(footer_note)) {
gt_tbl <- gt_tbl |>
tab_source_note(
source_note = footer_note
)
}
gt_tblfinal_dataset Tibble
| Dataset Structure & Completeness Overview | |||
|---|---|---|---|
| Column | Total Records | Unique Values | Missing Values |
| year | 1296 | 27 | 0 |
| age_band | 1296 | 8 | 0 |
| fertility_rate | 1296 | 158 | 0 |
| uom | 1296 | 2 | 0 |
| marital_status | 1296 | 3 | 0 |
| labour_status | 1296 | 2 | 0 |
| count | 1296 | 595 | 0 |
| Year range: 1991–2020 | Missing years: 1995, 2000, 2005 | |||
6.1.6 Create Aggregated Totals for Analysis
Show Code
# Create aggregated totals function for reusability
create_totals <- function(data, value_col) {
data |>
group_by(year, sex, marital_status) |>
summarise(
age_band = "All",
!!value_col := sum(!!sym(value_col), na.rm = TRUE),
.groups = "drop"
)
}
# Apply to both datasets for comprehensive analysis
work_all <- create_totals(work_clean, "labour_force")
not_working_all <- create_totals(not_working_clean, "outside_labour_force")
# Combine with existing data
work_complete <- bind_rows(work_clean, work_all)
not_working_complete <- bind_rows(not_working_clean, not_working_all)
cat("✓ Created aggregated totals for comprehensive analysis\n")✓ Created aggregated totals for comprehensive analysis
Show Code
cat("Work data with totals shape:", dim(work_complete), "\n")Work data with totals shape: 1566 5
Show Code
cat("Not working data with totals shape:", dim(not_working_complete), "\n")Not working data with totals shape: 1566 5
6.2 Dataset Integration Results
The final integrated dataset successfully combines:
- Fertility rates from SingStat (1990-2022)
- Female labour force participation from data.gov.sg
- Demographic breakdowns by age group and marital status in time series
This integrated dataset forms the foundation for our comprehensive analysis of Singapore’s fertility crisis and its relationship with socioeconomic factors. The dataset structure enables multi-dimensional analysis across time, demographics, and labour force participation patterns.
Show Code
datatable(
final_dataset,
class = "compact stripe hover", # make rows & font more compact
extensions = 'Buttons',
filter = "none", # turn off per-column filters
options = list(
pageLength = 6,
scrollX = TRUE,
dom = 'Bfrtip', # Buttons, global filter, table, info, pagination
buttons = list(
list(extend = 'csv', text = 'Export CSV')
)
),
rownames = FALSE
)final_dataset
7 Statistical Analysis
Show Code
#| echo: true
#| eval: true
#| label: tbl-6
#| tbl-cap: "Correlation Matrix"
# Calculate correlations between key variables
correlation_data <- final_dataset |>
filter(age_band == "All") |>
group_by(year, labour_status) |>
summarise(
fertility_rate = first(fertility_rate),
total_count = sum(count, na.rm = TRUE),
.groups = "drop"
) |>
pivot_wider(
names_from = labour_status,
values_from = total_count
) |>
mutate(
labour_participation_rate = labour_force / (labour_force + outside_labour_force),
total_female_population = labour_force + outside_labour_force
)
# Calculate correlation matrix
cor_matrix <- correlation_data |>
select(fertility_rate,
labour_participation_rate,
labour_force,
outside_labour_force) |>
cor(use = "complete.obs") |>
round(3)
# Turn it into a tibble for gt
cor_tbl <- as.data.frame(cor_matrix) |>
rownames_to_column(var = "Variable") |>
as_tibble()
# Render with gt
cor_tbl |>
gt(rowname_col = "Variable") |>
tab_header(
title = md("**Correlation Matrix: Key Variables**")
) |>
fmt_number(
columns = everything(),
decimals = 3
)| Correlation Matrix: Key Variables | ||||
|---|---|---|---|---|
| fertility_rate | labour_participation_rate | labour_force | outside_labour_force | |
| fertility_rate | 1.000 | −0.875 | −0.936 | 0.672 |
| labour_participation_rate | −0.875 | 1.000 | 0.977 | −0.932 |
| labour_force | −0.936 | 0.977 | 1.000 | −0.835 |
| outside_labour_force | 0.672 | −0.932 | −0.835 | 1.000 |
Show Code
# Key correlation insights
cat("• Fertility Rate vs Labour Participation Rate:",
cor_matrix["fertility_rate", "labour_participation_rate"], "\n")• Fertility Rate vs Labour Participation Rate: -0.875
Show Code
cat("• Fertility Rate vs Labour Force:",
cor_matrix["fertility_rate", "labour_force"], "\n")• Fertility Rate vs Labour Force: -0.936
Show Code
cat("• Fertility Rate vs Outside Labour Force:",
cor_matrix["fertility_rate", "outside_labour_force"], "\n")• Fertility Rate vs Outside Labour Force: 0.672
There is a strong negative correlation between fertility rate and both female labour participation and overall female labour force size. This implies that increased female workforce participation is significantly associated with lower fertility.
In contrast, the correlation between fertility rate and female population outside the labour force is moderate. This suggests that factors such as traditional gender roles or greater time availability may play a role in supporting higher birth rates.
Trend Analysis
Show Code
# Calculate year-over-year changes
trend_analysis <- final_dataset |>
filter(age_band == "All") |>
group_by(year, labour_status) |>
summarise(
fertility_rate = first(fertility_rate),
total_count = sum(count, na.rm = TRUE),
.groups = "drop"
) |>
arrange(year) |>
mutate(
fertility_change = fertility_rate - lag(fertility_rate),
fertility_pct_change = (fertility_rate - lag(fertility_rate)) / lag(fertility_rate) * 100
)
# Summary statistics
summary_stats <- trend_analysis |>
filter(!is.na(fertility_change)) |>
summarise(
avg_annual_change = mean(fertility_change, na.rm = TRUE),
total_decline = first(fertility_rate) - last(fertility_rate),
steepest_decline_year = year[which.min(fertility_change)],
steepest_decline_value = min(fertility_change, na.rm = TRUE)
)
cat("• Average annual fertility decline:", round(summary_stats$avg_annual_change, 4), "per year\n")• Average annual fertility decline: -0.0119 per year
Show Code
cat("• Total fertility decline (1990-2020):", round(summary_stats$total_decline, 2), "\n")• Total fertility decline (1990-2020): 0.63
Show Code
cat("• Steepest decline occurred in:", summary_stats$steepest_decline_year, "\n")• Steepest decline occurred in: 1998
Show Code
cat("• Steepest decline value:", round(summary_stats$steepest_decline_value, 3), "\n")• Steepest decline value: -0.13
The year 1998 emerges as a key inflection point, likely influenced by the Asian Financial Crisis. The crisis introduced economic uncertainty, which may have delayed family planning decisions.
Overall, the data show a consistent downward trend in fertility, reflecting structural changes in: Social norms, Career aspirations & Marriage and childbirth timing
8 Enhanced Data Visualisation
8.1 Static Visualisation
Show Code
viz_data <- final_dataset |>
group_by(year, marital_status, labour_status) |>
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") |>
mutate(
status = fct_inorder(paste(marital_status, labour_status, sep = " / "))
)
fertility_line_data <- final_dataset |>
filter(age_band == "All") |>
distinct(year, fertility_rate) |>
mutate(
fertility_shift = fertility_rate - 1
)
max_count <- max(viz_data$count, na.rm = TRUE)
scale_factor <- max_count
ggplot() +
geom_col(
data = viz_data,
aes(x = year, y = count, fill = status),
position = position_dodge(width = 0.8),
colour = "white", size = 0.2
) +
# Fertility line + points, using shifted & scaled fertility
geom_line(
data = fertility_line_data,
aes(x = year, y = fertility_shift * scale_factor),
colour = "gray69", size = 1
) +
geom_point(
data = fertility_line_data,
aes(x = year, y = fertility_shift * scale_factor),
colour = "gray45", size = 1
) +
# Primary + secondary axes
scale_y_continuous(
name = "Female Population (thousands)",
labels = comma,
limits = c(0, max_count),
expand = expansion(c(0, 0)),
sec.axis = sec_axis(
# inverse of y = (fertility - 1) * scale_factor
transform = ~ . / scale_factor + 1,
name = "Total Fertility Rate (per female)",
breaks = seq(1, 2, by = 0.25),
labels = label_number(accuracy = 0.1)
)
) +
# Manual fill colours
scale_fill_manual(
name = "Marital Status / Labour Status",
values = c(
"married / labour_force" = "green3",
"married / outside_labour_force" = "darkgreen",
"single / labour_force" = "red2",
"single / outside_labour_force" = "darkred",
"widowed_divorced / labour_force" = "blue1",
"widowed_divorced / outside_labour_force" = "blue4"
),
labels = c(
"Married – In Labour Force",
"Married – Outside Labour Force",
"Single – In Labour Force",
"Single – Outside Labour Force",
"Divorced/Widowed – In Labour Force",
"Divorced/Widowed – Outside Labour Force"
)
) +
labs(
title = "Singapore's Fertility Crisis: Labour Force vs TFR (1991–2020)",
x = "Year",
caption = "Data: SingStat & data.gov.sg"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 16),
axis.title = element_text(face = "bold"),
legend.position = "bottom",
legend.title = element_text(face = "bold")
)The above graph shows the relationship between Total Fertility Rate(per female), Female population(thousands) and Marital/Labour Status.
8.2 Interactive Dashboard
To introduce interactivity, we will create an interactive dashboard using plotly to allow users to filter by marital status, labour status, and age band.
this code allows for clicking of the legend trace and labels to filter the data. but it has the (“label”,1) naming issue. i gave up on this, theres no way to use ggplot2 and ggplotly to fix the legend. must use plotly or shiny or somehing else. I am trying to create a custom legend below and then slowly integrate the dropdown across all variables later in the next code chunk.
Show Code
# 1. Aggregate labour counts across ALL age bands for each marital_status × labour_status
agg_counts <- final_dataset |>
group_by(year, marital_status, labour_status) |>
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") |>
mutate(
status = fct_inorder(paste(marital_status, labour_status, sep = " / "))
)
# 2. Create Crosstalk shared dataset with proper key
shared_counts <- SharedData$new(
agg_counts,
key = ~interaction(marital_status, labour_status),
group = "labour_group"
)
# 3. Create separate filters for marital and labour status
filter_marital <- filter_select(
id = "marital_filter",
label = "Select Marital Status:",
sharedData = shared_counts,
group = ~marital_status
)
filter_labour <- filter_select(
id = "labour_filter",
label = "Select Labour Status:",
sharedData = shared_counts,
group = ~labour_status
)
# 4. Fertility lines by age_band (excluding "All" total)
fertility_ab <- final_dataset |>
filter(age_band != "All") |>
distinct(year, age_band, fertility_rate)
# Create separate shared data for lines
shared_lines <- SharedData$new(fertility_ab, key = ~age_band, group = "age_group")
# 5. Age band filter
filter_age <- filter_select(
id = "age_filter",
label = "Select Age Band:",
sharedData = shared_lines,
group = ~age_band
)
# 6. Compute scale_factor
max_count <- max(agg_counts$count, na.rm = TRUE)
max_fertility <- max(fertility_ab$fertility_rate, na.rm = TRUE)
scale_factor <- max_count / max_fertility
# 7. Build the ggplot
p_ab <- ggplot() +
# Bars (using shared_counts)
geom_col(
data = shared_counts,
aes(
x = year,
y = count,
fill = status,
group = status,
text = paste0(
"Year: ", year,
"<br>Marital: ", marital_status,
"<br>Labour: ", labour_status,
"<br>Count: ", comma(count)
)
),
position = position_dodge(width = 0.8),
colour = "white",
size = 0.2,
alpha = 0.8
) +
# Lines (using shared_lines)
geom_line(
data = shared_lines,
aes(
x = year,
y = fertility_rate * scale_factor,
colour= age_band,
group = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate,2)
)
),
size = 0.8,
alpha = 0.8
) +
# Points
geom_point(
data = shared_lines,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate,2)
)
),
size = 1
) +
# Dual axis setup
scale_y_continuous(
name = "Female Population (thousands)",
labels = comma,
sec.axis = sec_axis(
transform = ~ . / scale_factor,
name = "Fertility Rate (per thousand females)",
labels = label_number(accuracy = 0.1)
)
) +
# Color scales
scale_fill_manual(
name = "Marital & Labour Status",
values = c(
"married / labour_force" = "green3",
"married / outside_labour_force" = "darkgreen",
"single / labour_force" = "red2",
"single / outside_labour_force" = "darkred",
"widowed_divorced / labour_force" = "blue1",
"widowed_divorced / outside_labour_force" = "blue4"
)
) +
scale_colour_brewer(
palette = "Set2",
name = "Age Band"
) +
# Labels and theme
labs(
title = "Age‐Banded Fertility Rates vs Female Labour Counts",
subtitle = "Bars by Marital & Labour Status; Lines by Age Band",
x = "Year",
caption = "Data: SingStat & data.gov.sg"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(size = 12, color = "gray60"),
axis.title = element_text(face = "bold", size = 6),
legend.position = "none", # We'll use custom legend instead
panel.grid.minor = element_blank()
)
# 8. Convert to Plotly
interactive_ab <- ggplotly(p_ab, tooltip = "text") %>%
layout(
margin = list(l = 80, r = 80, b = 200, t = 80) # Extra bottom space
)
# 9. Disable built-in legend for data traces
for(i in seq_along(interactive_ab$x$data)) {
interactive_ab$x$data[[i]]$showlegend <- FALSE
}
# 10. Create custom legend traces
# Marital/Labour status legend (squares)
status_combinations <- list(
list(name = "Married - In Labour", color = "green3", key = "married / labour_force"),
list(name = "Married - Not Working", color = "darkgreen", key = "married / outside_labour_force"),
list(name = "Single - In Labour", color = "red2", key = "single / labour_force"),
list(name = "Single - Not Working", color = "darkred", key = "single / outside_labour_force"),
list(name = "Divorced/Widowed - In Labour", color = "blue1", key = "widowed_divorced / labour_force"),
list(name = "Divorced/Widowed - Not Working", color = "blue4", key = "widowed_divorced / outside_labour_force")
)
status_traces <- lapply(status_combinations, function(item) {
list(
name = item$name,
legendgroup= "status_legend",
x = c(NA), # Single invisible point
y = c(NA),
type = "scatter",
mode = "markers",
marker = list(
color = item$color,
symbol = "square",
size = 8
),
showlegend = TRUE
)
})
# Age‐band legend (lines)
age_bands <- c("15-19","20-24","25-29","30-34","35-39","40-44","45-49")
age_colors <- RColorBrewer::brewer.pal(length(age_bands), "Set2")
age_traces <- lapply(seq_along(age_bands), function(i) {
list(
name = age_bands[i],
legendgroup= "age_legend",
x = c(NA), # Single invisible point
y = c(NA),
type = "scatter",
mode = "lines",
line = list(color = age_colors[i], width = 3),
showlegend = TRUE
)
})
# 11. Append custom legend traces
interactive_ab$x$data <- c(
interactive_ab$x$data,
status_traces,
age_traces
)
# 12. Configure legend layout
interactive_ab <- interactive_ab %>%
layout(
legend = list(
orientation = "h",
y = -0.25, # Position below chart
x = 0.5,
xanchor = "center",
traceorder = "normal",
itemsizing = "constant",
font = list(size = 10),
itemwidth = 30
)
)
# 13. Render final visualization
tagList(
div(style = "margin-bottom:20px;",
h3("Interactive Fertility Dashboard"),
p("Use dropdowns to filter by marital status, labour status, and age band")
),
div(style = "display: flex; flex-wrap: wrap; gap: 10px; margin-bottom: 10px;",
div(style = "flex: 1; min-width: 200px;", filter_marital),
div(style = "flex: 1; min-width: 200px;", filter_labour),
div(style = "flex: 1; min-width: 200px;", filter_age)
),
interactive_ab
)Interactive Fertility Dashboard
Use dropdowns to filter by marital status, labour status, and age band
The first iteration of the interactive dashboard correctly uses plotly to highlight certain parts of the graph, however, fails to correctly filter the data based on the selected dropdown categories. Additionally, the legend is not properly labelled, with a (,1) appearing at the end of legend labels.
8.3 Interactive Dashboard but with custom legend label
The second iteration successfully implements a custom legend that correctly labels the legend items, but still does not filter the data based on the selected dropdown categories. Additionally, the legend completely disappears when changing filtering that is not the base graph.
so far so good, need to implement genuine interactivity now.
Show Code
# 1. Aggregate labour counts across ALL age bands for each marital_status × labour_status
agg_counts <- final_dataset |>
group_by(year, marital_status, labour_status) |>
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") |>
mutate(
status = fct_inorder(paste(marital_status, labour_status, sep = " / "))
)
# 2. Crosstalk SharedData & filter
shared_counts <- SharedData$new(agg_counts, group = "labour_status_ab")
filter_ab <- filter_select(
id = "labour_ab_filter",
label = "Select Labour Status:",
sharedData = shared_counts,
group = ~labour_status
)
# 3. Fertility lines by age_band (excluding the "All" total)
fertility_ab <- final_dataset |>
filter(age_band != "All") |>
distinct(year, age_band, fertility_rate)
# 4. Compute scale_factor so that max(fertility_rate) → max(count)
max_count <- max(agg_counts$count, na.rm = TRUE)
max_fertility <- max(fertility_ab$fertility_rate, na.rm = TRUE)
scale_factor <- max_count / max_fertility
# 5. Build the ggplot
p_ab <- ggplot() +
geom_col(
data = shared_counts,
aes(
x = year,
y = count,
fill = status,
group = status,
legendgroup = status,
text = paste0(
"Year: ", year,
"<br>Status: ", tools::toTitleCase(gsub("_"," ", status)),
"<br>Count: ", comma(count)
)
),
position = position_dodge(width = 0.8),
colour = "white",
size = 0.2,
alpha = 0.8
) +
geom_line(
data = fertility_ab,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
legendgroup = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate,2)
)
),
size = 0.8,
alpha = 0.8
) +
geom_point(
data = fertility_ab,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
legendgroup = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate,2)
)
),
size = 1
) +
scale_y_continuous(
name = "Female Population (thousands)",
labels = comma,
sec.axis = sec_axis(
transform = ~ . / scale_factor,
name = "Fertility Rate (per thousand females)",
labels = label_number(accuracy = 0.1)
)
) +
scale_fill_manual(
name = "Marital Status / Labour Status",
values = c(
"married / labour_force" = "#00CD00", # green3
"married / outside_labour_force" = "#006400", # darkgreen
"single / labour_force" = "#EE0000", # red2
"single / outside_labour_force" = "#8B0000", # darkred
"widowed_divorced / labour_force" = "#0000FF", # blue1
"widowed_divorced / outside_labour_force" = "#00008B" # blue4
)
) +
scale_colour_brewer(
palette = "Set2",
name = "Age Band"
) +
labs(
title = "Age‐Banded Fertility Rates vs Female Labour Counts",
subtitle = "Bars by Marital & Labour Status; Lines by Age Band",
x = "Year",
caption = "Data: SingStat & data.gov.sg"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(size = 12, color = "gray60"),
axis.title = element_text(face = "bold", size = 6),
legend.position = "bottom",
panel.grid.minor = element_blank()
)
# 6. Convert to Plotly
interactive_ab <- ggplotly(p_ab, tooltip = "text") %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
xanchor = "center",
y = -0.25,
font = list(size = 8),
title = list(
text = "Marital Status / Labour Status",
font = list(size = 10)
),
itemwidth = 70
),
margin = list(
l = 80, # left
r = 80, # right
b = 140, # bottom
t = 80
)
)
# 7. Turn off built-in legends
for(i in seq_along(interactive_ab$x$data)) {
interactive_ab$x$data[[i]]$showlegend <- FALSE
}
# 8. Define EXACT color mapping for status categories
status_colors <- c(
"married / labour_force" = "#00CD00", # green3
"married / outside_labour_force" = "#006400", # darkgreen
"single / labour_force" = "#EE0000", # red2
"single / outside_labour_force" = "#8B0000", # darkred
"widowed_divorced / labour_force" = "#0000FF", # blue1
"widowed_divorced / outside_labour_force" = "#00008B" # blue4
)
status_labels <- c(
"Married – In Labour Force",
"Married – Outside Labour Force",
"Single – In Labour Force",
"Single – Outside Labour Force",
"Divorced/Widowed – In Labour Force",
"Divorced/Widowed – Outside Labour Force"
)
# 9. Create status legend using exact hex colors
status_traces <- lapply(1:6, function(i) {
list(
x = NA,
y = NA,
name = status_labels[i],
type = "scatter",
mode = "markers",
marker = list(
color = status_colors[i],
symbol = "square",
size = 8
),
showlegend = TRUE,
hoverinfo = "none" # Prevent hover info on legend items
)
})
# 10. Create age-band legend with explicit colors
age_bands <- c("15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49")
age_colors <- RColorBrewer::brewer.pal(length(age_bands), "Set2")
age_traces <- lapply(seq_along(age_bands), function(i) {
list(
x = NA,
y = NA,
name = age_bands[i],
type = "scatter",
mode = "lines",
line = list(color = age_colors[i], width = 2),
showlegend = TRUE,
hoverinfo = "none" # Prevent hover info on legend items
)
})
# 11. Append custom legend traces
interactive_ab$x$data <- c(
interactive_ab$x$data,
status_traces,
age_traces
)
# 12. Adjust layout for better legend display
interactive_ab <- interactive_ab %>%
layout(
legend = list(
orientation = "h",
x = 0.5,
xanchor = "center",
y = -0.3, # Move legend lower
font = list(size = 8),
traceorder = "normal", # Keep specified order
itemsizing = "constant", # Consistent item sizes
itemwidth = 30
),
margin = list(
l = 80,
r = 80,
b = 180, # Increase bottom margin for legend space
t = 80
)
)
# 13. Render final visualization
tagList(
div(style = "margin-bottom:20px;",
h3("Interactive Age‐Banded Fertility Dashboard"),
p("Use the dropdown to filter bars by labour status.")
),
filter_ab,
interactive_ab
)Interactive Age‐Banded Fertility Dashboard
Use the dropdown to filter bars by labour status.
8.4 Fix needed
After attempting different methods, we decided that using html to manually create the legends was the last and most plausible option. As the legends are not externally created, they persist through filtering. However, certain parts that should be kept such as age_band lines do not persist through filtering.
Legend persists through all toggle iterations but to fix to make age_band lines also persist
Show Code
# 1. Aggregate labour counts across ALL age bands for each marital_status × labour_status
agg_counts <- final_dataset |>
group_by(year, marital_status, labour_status) |>
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") |>
mutate(
status = fct_inorder(paste(marital_status, labour_status, sep = " / "))
)
# 2. Create SharedData ONLY for the bar data (not the lines)
shared_bars <- SharedData$new(agg_counts, group = "labour_status_ab")
# 3. Create filter for bars only
filter_ab <- filter_select(
id = "labour_ab_filter",
label = "Select Labour Status:",
sharedData = shared_bars,
group = ~labour_status
)
# 4. Fertility lines by age_band (excluding the "All" total)
fertility_ab <- final_dataset |>
filter(age_band != "All") |>
distinct(year, age_band, fertility_rate)
# 5. Compute scale_factor so that max(fertility_rate) → max(count)
max_count <- max(agg_counts$count, na.rm = TRUE)
max_fertility <- max(fertility_ab$fertility_rate, na.rm = TRUE)
scale_factor <- max_count / max_fertility
# 6. Build the ggplot with separate data sources
p_ab <- ggplot() +
# Bars - these will be filtered by crosstalk
geom_col(
data = shared_bars, # Only the bars use SharedData
aes(
x = year,
y = count,
fill = status,
group = status,
text = paste0(
"Year: ", year,
"<br>Status: ", tools::toTitleCase(gsub("_"," ", status)),
"<br>Count: ", comma(count)
)
),
position = position_dodge(width = 0.8),
colour = "white",
size = 0.2,
alpha = 0.8
) +
# Lines - these use regular data and won't be filtered
geom_line(
data = fertility_ab, # Regular data frame, not SharedData
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate,2)
)
),
size = 0.8,
alpha = 0.8
) +
# Points - these use regular data and won't be filtered
geom_point(
data = fertility_ab, # Regular data frame, not SharedData
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate,2)
)
),
size = 1
) +
scale_y_continuous(
name = "Female Population (thousands)",
labels = comma,
sec.axis = sec_axis(
transform = ~ . / scale_factor,
name = "Fertility Rate (per thousand females)",
labels = label_number(accuracy = 0.1)
)
) +
scale_fill_manual(
name = "Marital Status / Labour Status",
values = c(
"married / labour_force" = "#00CD00", # green3
"married / outside_labour_force" = "#006400", # darkgreen
"single / labour_force" = "#EE0000", # red2
"single / outside_labour_force" = "#8B0000", # darkred
"widowed_divorced / labour_force" = "#0000FF", # blue1
"widowed_divorced / outside_labour_force" = "#00008B" # blue4
)
) +
scale_colour_brewer(
palette = "Set2",
name = "Age Band"
) +
labs(
title = "Age‐Banded Fertility Rates vs Female Labour Counts",
subtitle = "Bars by Marital & Labour Status; Lines by Age Band (Always Visible)",
x = "Year",
caption = "Data: SingStat & data.gov.sg"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(size = 12, color = "gray60"),
axis.title = element_text(face = "bold", size = 6),
legend.position = "none", # Turn off all legends - we'll create custom ones
panel.grid.minor = element_blank()
)
# 7. Convert to Plotly without any legends
interactive_ab <- ggplotly(p_ab, tooltip = "text") %>%
layout(
showlegend = FALSE, # Ensure no legend shows
margin = list(
l = 80,
r = 80,
b = 80, # Reduced bottom margin since legend will be separate
t = 80
)
)
# 8. Create separate HTML legend (same as before)
status_colors <- c(
"married / labour_force" = "#00CD00",
"married / outside_labour_force" = "#006400",
"single / labour_force" = "#EE0000",
"single / outside_labour_force" = "#8B0000",
"widowed_divorced / labour_force" = "#0000FF",
"widowed_divorced / outside_labour_force" = "#00008B"
)
status_labels <- c(
"Married – In Labour Force",
"Married – Outside Labour Force",
"Single – In Labour Force",
"Single – Outside Labour Force",
"Divorced/Widowed – In Labour Force",
"Divorced/Widowed – Outside Labour Force"
)
age_bands <- c("15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49")
age_colors <- RColorBrewer::brewer.pal(length(age_bands), "Set2")
# Create HTML for status legend
status_legend_html <- paste(
'<div style="display: inline-block; margin-right: 20px; margin-bottom: 10px;">',
'<div style="font-weight: bold; margin-bottom: 8px; font-size: 14px;">Marital Status / Labour Status (Filterable)</div>',
paste(sapply(1:6, function(i) {
paste0(
'<div style="display: inline-block; margin-right: 15px; margin-bottom: 3px;">',
'<span style="display: inline-block; width: 12px; height: 12px; background-color: ',
status_colors[i], '; margin-right: 5px; vertical-align: middle;"></span>',
'<span style="font-size: 11px; vertical-align: middle;">', status_labels[i], '</span>',
'</div>'
)
}), collapse = ""),
'</div>'
)
# Create HTML for age band legend
age_legend_html <- paste(
'<div style="display: inline-block; margin-bottom: 10px;">',
'<div style="font-weight: bold; margin-bottom: 8px; font-size: 14px;">Age Band (Always Visible)</div>',
paste(sapply(seq_along(age_bands), function(i) {
paste0(
'<div style="display: inline-block; margin-right: 12px; margin-bottom: 3px;">',
'<span style="display: inline-block; width: 16px; height: 2px; background-color: ',
age_colors[i], '; margin-right: 5px; vertical-align: middle;"></span>',
'<span style="font-size: 11px; vertical-align: middle;">', age_bands[i], '</span>',
'</div>'
)
}), collapse = ""),
'</div>'
)
# Combine legends
combined_legend <- paste0(
'<div style="text-align: center; margin-top: 20px; padding: 10px; background-color: #f9f9f9; border: 1px solid #ddd; border-radius: 5px;">',
'<div style="margin-bottom: 15px;">', status_legend_html, '</div>',
'<div>', age_legend_html, '</div>',
'</div>'
)
# 9. Render final visualization with separate HTML legend
tagList(
div(style = "margin-bottom:20px;",
h3("Interactive Age‐Banded Fertility Dashboard"),
p("Use the dropdown to filter bars by labour status. Age band lines remain visible for all filters.")
),
filter_ab,
interactive_ab,
HTML(combined_legend)
)Interactive Age‐Banded Fertility Dashboard
Use the dropdown to filter bars by labour status. Age band lines remain visible for all filters.
Show Code
library(dplyr)
library(forcats)
library(crosstalk)
library(highcharter)
library(scales)
library(htmltools)
# 1. Aggregate counts
agg_counts <- final_dataset |>
group_by(year, marital_status, labour_status) |>
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") |>
mutate(status = fct_inorder(paste(marital_status, labour_status, sep = " / ")))
# 2. SharedData for bar chart only
shared_bars <- SharedData$new(agg_counts, group = "labour_status_ab")
# 3. Fertility lines (no "All")
fertility_ab <- final_dataset |>
filter(age_band != "All") |>
distinct(year, age_band, fertility_rate)
# 4. Scaling factor for dual y-axis
max_count <- max(agg_counts$count, na.rm = TRUE)
max_fertility <- max(fertility_ab$fertility_rate, na.rm = TRUE)
scale_factor <- max_count / max_fertility
# 5. Bar colors
status_colors <- c(
"married / labour_force" = "#00CD00",
"married / outside_labour_force" = "#006400",
"single / labour_force" = "#EE0000",
"single / outside_labour_force" = "#8B0000",
"widowed_divorced / labour_force" = "#0000FF",
"widowed_divorced / outside_labour_force" = "#00008B"
)
# 6. Fertility line colors
age_bands <- c("15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49")
age_colors <- RColorBrewer::brewer.pal(length(age_bands), "Set2")
names(age_colors) <- age_bands
# 7. Create base chart
chart <- highchart() |>
hc_chart(type = "column") |>
hc_title(text = "Age‐Banded Fertility Rates vs Female Labour Counts") |>
hc_subtitle(text = "Bars by Marital & Labour Status; Lines by Age Band (Always Visible)") |>
hc_xAxis(type = "category", title = list(text = "Year")) |>
hc_yAxis_multiples(
list(title = list(text = "Female Population (thousands)")),
list(title = list(text = "Fertility Rate (per thousand females)"), opposite = TRUE)
)
# 8. Add fertility lines (always visible)
for (band in age_bands) {
df_band <- fertility_ab |>
filter(age_band == band) |>
mutate(scaled_rate = fertility_rate * scale_factor)
chart <- chart |>
hc_add_series(
data = df_band,
type = "line",
name = band,
hcaes(x = year, y = scaled_rate),
color = age_colors[band],
yAxis = 0,
tooltip = list(pointFormat = paste0(
"<b>Age Band:</b> ", band,
"<br><b>Fertility:</b> {point.fertility_rate:.2f}"
)),
dataLabels = list(enabled = FALSE),
marker = list(enabled = TRUE),
)
}
# 9. Add bars (filterable)
for (status in names(status_colors)) {
df_status <- agg_counts |>
filter(status == !!status)
# Identify group
group <- if (grepl("outside_labour_force", status)) {
"outside"
} else {
"labour"
}
chart <- chart |>
hc_add_series(
data = df_status,
type = "column",
name = tools::toTitleCase(gsub("_", " ", status)),
hcaes(x = year, y = count),
color = status_colors[status],
yAxis = 0,
visible = FALSE, # Hide by default
id = status,
custom = list(group = group)
)
}
# 10. Wrap in highchartOutput using htmltools
hc_widget <- highchartOutput("hc_ab", height = "600px")
# 11. Crosstalk filter for bars
filter_ab <- filter_select(
id = "labour_ab_filter",
label = "Select Labour Status:",
sharedData = shared_bars,
group = ~labour_status
)
# 12. Create HTML legends
status_labels <- c(
"Married – In Labour Force",
"Married – Outside Labour Force",
"Single – In Labour Force",
"Single – Outside Labour Force",
"Divorced/Widowed – In Labour Force",
"Divorced/Widowed – Outside Labour Force"
)
status_legend_html <- paste(
'<div style="display: inline-block; margin-right: 20px; margin-bottom: 10px;">',
'<div style="font-weight: bold; margin-bottom: 8px; font-size: 14px;">Marital Status / Labour Status (Filterable)</div>',
paste(sapply(1:6, function(i) {
paste0(
'<div style="display: inline-block; margin-right: 15px; margin-bottom: 3px;">',
'<span style="display: inline-block; width: 12px; height: 12px; background-color: ',
status_colors[i], '; margin-right: 5px; vertical-align: middle;"></span>',
'<span style="font-size: 11px; vertical-align: middle;">', status_labels[i], '</span>',
'</div>'
)
}), collapse = ""),
'</div>'
)
age_legend_html <- paste(
'<div style="display: inline-block; margin-bottom: 10px;">',
'<div style="font-weight: bold; margin-bottom: 8px; font-size: 14px;">Age Band (Always Visible)</div>',
paste(sapply(seq_along(age_bands), function(i) {
paste0(
'<div style="display: inline-block; margin-right: 12px; margin-bottom: 3px;">',
'<span style="display: inline-block; width: 16px; height: 2px; background-color: ',
age_colors[i], '; margin-right: 5px; vertical-align: middle;"></span>',
'<span style="font-size: 11px; vertical-align: middle;">', age_bands[i], '</span>',
'</div>'
)
}), collapse = ""),
'</div>'
)
combined_legend <- paste0(
'<div style="text-align: center; margin-top: 20px; padding: 10px; background-color: #f9f9f9; border: 1px solid #ddd; border-radius: 5px;">',
'<div style="margin-bottom: 15px;">', status_legend_html, '</div>',
'<div>', age_legend_html, '</div>',
'</div>'
)
# 13. Render UI (assuming Shiny or R Markdown)
tagList(
div(style = "margin-bottom:20px;",
h3("Interactive Age‐Banded Fertility Dashboard"),
p("Use the dropdown to filter bars by labour status. Age band lines remain visible for all filters.")
),
filter_ab,
hc_widget,
HTML(combined_legend)
)Interactive Age‐Banded Fertility Dashboard
Use the dropdown to filter bars by labour status. Age band lines remain visible for all filters.
Show Code
chartShow Code
library(htmlwidgets)
chart <- chart %>%
onRender("
function(el, x) {
// Create dropdown
var container = document.createElement('div');
container.style.marginBottom = '10px';
var select = document.createElement('select');
select.innerHTML = `
<option value='labour'>Labour Force</option>
<option value='outside'>Outside Labour Force</option>
`;
select.style.padding = '4px';
container.appendChild(select);
el.parentNode.insertBefore(container, el);
function toggleGroup(selectedGroup) {
Highcharts.charts.forEach(chart => {
if (!chart) return;
chart.series.forEach(s => {
if (!s.userOptions.custom) return;
if (s.userOptions.custom.group === selectedGroup) {
s.show();
} else if (s.type === 'column') {
s.hide();
}
});
});
}
// Set initial view
toggleGroup('labour');
// Add listener
select.addEventListener('change', e => {
toggleGroup(e.target.value);
});
}
")
chart8.5 v2 graph
Finally, the last iteration of the graph is able to produce a fully interactive dashboard with the ability to filter by labour status. Features such as zooming in and selecting specific visualisations are implemented through plotly.
Made the age_band lines to persist when filtering
Show Code
# 1.1. Aggregate for bars
agg_counts <- final_dataset |>
group_by(year, marital_status, labour_status) |>
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") |>
mutate(
status = forcats::fct_inorder(paste(marital_status, labour_status, sep = " / "))
)
# 1.2. Bars as SharedData
shared_bars <- SharedData$new(agg_counts, group = "labour_status_ab")
# 1.3. Filter for bars
filter_ab <- filter_select(
id = "labour_ab_filter",
label = "Select Labour Status:",
sharedData = shared_bars,
group = ~labour_status
)
# 1.4. Lines data as SharedData (same group!)
fertility_ab <- final_dataset |>
filter(age_band != "All") |>
distinct(year, age_band, fertility_rate)
shared_lines <- SharedData$new(fertility_ab, group = "labour_status_ab") # this is key
# 1.5. Compute scaling for dual y axis
max_count <- max(agg_counts$count, na.rm = TRUE)
max_fertility <- max(fertility_ab$fertility_rate, na.rm = TRUE)
scale_factor <- max_count / max_fertility
# 1.6. Colors
status_colors <- c(
"married / labour_force" = "#00CD00",
"married / outside_labour_force" = "#006400",
"single / labour_force" = "#EE0000",
"single / outside_labour_force" = "#8B0000",
"widowed_divorced / labour_force" = "#0000FF",
"widowed_divorced / outside_labour_force" = "#00008B"
)
status_labels <- c(
"Married – In Labour Force",
"Married – Outside Labour Force",
"Single – In Labour Force",
"Single – Outside Labour Force",
"Divorced/Widowed – In Labour Force",
"Divorced/Widowed – Outside Labour Force"
)
age_bands <- c("15-19", "20-24", "25-29", "30-34", "35-39", "40-44", "45-49")
age_colors <- RColorBrewer::brewer.pal(length(age_bands), "Set2")
names(age_colors) <- age_bands
# ----- 2. ggplot: both bars and lines use SharedData -----
p_ab <- ggplot() +
geom_col(
data = shared_bars,
aes(
x = year,
y = count,
fill = status,
group= status,
text = paste0(
"Year: ", year,
"<br>Status: ", tools::toTitleCase(gsub("_"," ", status)),
"<br>Count: ", comma(count)
)
),
position = position_dodge(width = 0.8),
colour = "white",
size = 0.2,
alpha = 0.8,
show.legend = FALSE
) +
geom_line(
data = shared_lines,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate, 2)
)
),
size = 1.2,
alpha = 0.85
) +
geom_point(
data = shared_lines,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
text = paste0(
"Year: ", year,
"<br>Age band: ", age_band,
"<br>Fertility: ", round(fertility_rate, 2)
)
),
size = 2,
alpha = 0.9
) +
scale_fill_manual(
name = "Marital Status / Labour Status",
values = status_colors
) +
scale_colour_manual(
values = age_colors,
name = "Age Band"
) +
scale_y_continuous(
name = "Female Population (thousands)",
labels = comma,
sec.axis = sec_axis(
transform = ~ . / scale_factor,
name = "Fertility Rate (per thousand females)",
labels = label_number(accuracy = 0.1)
)
) +
labs(
title = "Age‐Banded Fertility Rates vs Female Labour Counts",
subtitle = "Bars: Marital & Labour Status (filterable); Lines: Age Band (always visible)",
x = "Year",
caption = "Data: SingStat & data.gov.sg"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", size = 13),
plot.subtitle = element_text(size = 11, color = "gray60"),
axis.title = element_text(face = "bold", size = 10),
legend.position = "none",
panel.grid.minor = element_blank()
)
# ----- 3. Plotly conversion -----
interactive_ab <- ggplotly(p_ab, tooltip = "text") %>%
layout(
showlegend = FALSE,
margin = list(l = 80, r = 80, b = 80, t = 80)
)
# ----- 4. Custom Legends as HTML -----
status_legend_html <- paste(
'<div style="display: inline-block; margin-right: 20px; margin-bottom: 10px;">',
'<div style="font-weight: bold; margin-bottom: 8px; font-size: 14px;">Marital Status / Labour Status (Filterable)</div>',
paste(sapply(1:6, function(i) {
paste0(
'<div style="display: inline-block; margin-right: 15px; margin-bottom: 3px;">',
'<span style="display: inline-block; width: 12px; height: 12px; background-color: ',
status_colors[i], '; margin-right: 5px; vertical-align: middle;"></span>',
'<span style="font-size: 11px; vertical-align: middle;">', status_labels[i], '</span>',
'</div>'
)
}), collapse = ""),
'</div>'
)
age_legend_html <- paste(
'<div style="display: inline-block; margin-bottom: 10px;">',
'<div style="font-weight: bold; margin-bottom: 8px; font-size: 14px;">Age Band (Always Visible)</div>',
paste(sapply(seq_along(age_bands), function(i) {
paste0(
'<div style="display: inline-block; margin-right: 12px; margin-bottom: 3px;">',
'<span style="display: inline-block; width: 30px; height: 5px; background-color: ',
age_colors[i], '; margin-right: 5px; vertical-align: middle; border-radius: 2px;"></span>',
'<span style="font-size: 11px; vertical-align: middle;">', age_bands[i], '</span>',
'</div>'
)
}), collapse = ""),
'</div>'
)
combined_legend <- paste0(
'<div style="text-align: center; margin-top: 20px; padding: 10px; background-color: #f9f9f9; border: 1px solid #ddd; border-radius: 5px;">',
'<div style="margin-bottom: 15px;">', status_legend_html, '</div>',
'<div>', age_legend_html, '</div>',
'</div>'
)
# ----- 5. Render -----
tagList(
div(style = "margin-bottom:20px;",
h3("Interactive Age‐Banded Fertility Dashboard"),
p("Use the dropdown to filter bars by marital/labour status. Age band lines always remain visible and legend persists.")
),
filter_ab,
interactive_ab,
HTML(combined_legend)
)Interactive Age‐Banded Fertility Dashboard
Use the dropdown to filter bars by marital/labour status. Age band lines always remain visible and legend persists.
Show Code
library(ggplot2)
library(dplyr)
library(crosstalk)
library(scales)
library(plotly)
library(RColorBrewer)
library(htmltools)
# Load and clean data
final_dataset <- read.csv("Singapore’s Fertility Crisis A Data-Driven Analysis of Socioeconomic Factors.csv")
filtered_dataset <- final_dataset %>%
filter(age_band != "All") %>%
mutate(
marital_status = tools::toTitleCase(trimws(as.character(marital_status))),
labour_status = tools::toTitleCase(trimws(as.character(labour_status))),
age_band = tools::toTitleCase(trimws(as.character(age_band))),
count = as.numeric(as.character(count)),
fertility_rate = as.numeric(as.character(fertility_rate))
)
# SharedData for interactivity
shared_data <- SharedData$new(filtered_dataset, group = "full_filter")
# Filter controls
filter_marital <- filter_select("filter_marital", "Marital Status:", shared_data, ~marital_status)
filter_labour <- filter_select("filter_labour", "Labour Status:", shared_data, ~labour_status)
filter_ageband <- filter_select("filter_ageband", "Age Band:", shared_data, ~age_band)
# Colors
labour_colors <- setNames(brewer.pal(3, "Set2")[1:2], unique(filtered_dataset$labour_status))
age_colors <- setNames(brewer.pal(length(unique(filtered_dataset$age_band)), "Dark2"),
unique(filtered_dataset$age_band))
# Scaling factor
scale_factor <- max(filtered_dataset$count, na.rm = TRUE) / max(filtered_dataset$fertility_rate, na.rm = TRUE)
# Plot
p <- ggplot() +
geom_col(
data = shared_data,
aes(
x = factor(year),
y = count,
fill = labour_status,
group = labour_status,
text = paste0(
"Year: ", year,
"<br>Marital Status: ", marital_status,
"<br>Labour Status: ", labour_status,
"<br>Count: ", comma(count)
)
),
position = position_dodge(width = 0.7),
width = 0.6,
colour = "white",
size = 0.2,
alpha = 0.8
) +
geom_line(
data = shared_data,
aes(
x = as.factor(year),
y = fertility_rate * scale_factor,
group = age_band,
colour = age_band,
text = paste0(
"Year: ", year,
"<br>Age Band: ", age_band,
"<br>Fertility Rate: ", round(fertility_rate, 2)
)
),
size = 1.2,
alpha = 0.9
) +
geom_point(
data = shared_data,
aes(
x = as.factor(year),
y = fertility_rate * scale_factor,
group = age_band,
colour = age_band
),
size = 2,
alpha = 0.9
) +
scale_fill_manual(values = labour_colors) +
scale_colour_manual(values = age_colors) +
scale_y_continuous(
name = "Female Population (thousands)",
labels = comma,
sec.axis = sec_axis(~ . / scale_factor,
name = "Fertility Rate (per thousand females)",
labels = label_number(accuracy = 0.1))
) +
labs(
title = "Fertility Rates vs Labour Participation",
subtitle = "Bars: Labour Status; Lines: Fertility by Age Band",
x = "Year",
caption = "Source: SingStat & data.gov.sg"
) +
theme_minimal() +
theme(
plot.title = element_text(face = "bold"),
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"
)
# Plotly conversion
plotly_output <- ggplotly(p, tooltip = "text")
# Legend
labour_legend <- paste(
'<div style="font-weight:bold;margin-bottom:8px;">Labour Status</div>',
paste(sapply(names(labour_colors), function(label) {
sprintf('<div style="margin-bottom:4px;"><span style="display:inline-block;width:12px;height:12px;background:%s;margin-right:6px;"></span><span style="font-size:12px;">%s</span></div>',
labour_colors[[label]], label)
}), collapse = "")
)
age_legend <- paste(
'<div style="font-weight:bold;margin-top:15px;margin-bottom:8px;">Age Bands</div>',
paste(sapply(names(age_colors), function(label) {
sprintf('<div style="margin-bottom:4px;"><span style="display:inline-block;width:12px;height:12px;background:%s;margin-right:6px;"></span><span style="font-size:12px;">%s</span></div>',
age_colors[[label]], label)
}), collapse = "")
)
combined_legend <- paste0(
'<div style="margin-top:20px;padding:10px;background:#f9f9f9;border:1px solid #ccc;border-radius:5px;">',
labour_legend, age_legend,
'</div>'
)
# Final UI
tagList(
div(style = "margin-bottom:20px;",
h3("Interactive Fertility & Labour Dashboard"),
p("Use filters to explore female labour force counts and age-banded fertility rates.")
),
div(style = "display: flex; gap: 15px; margin-bottom: 20px;",
filter_marital,
filter_labour,
filter_ageband
),
plotly_output,
HTML(combined_legend)
)Interactive Fertility & Labour Dashboard
Use filters to explore female labour force counts and age-banded fertility rates.
faceted ggiraph
Show Code
# Interactive plot
# Prepare interactive data IDs for persistent selection
agg_counts <- final_dataset |>
group_by(year, marital_status, labour_status) |>
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") |>
mutate(
status = fct_inorder(paste(marital_status, labour_status, sep = " / "))
)
agg_counts <- agg_counts |>
mutate(data_id = paste0("bar_", marital_status, "_", year),
tooltip = paste0("Year: ", year,
"\nStatus: ", marital_status,
"\nCount: ", count))
fertility_ab <- final_dataset |>
filter(age_band != "All") |>
distinct(year, age_band, fertility_rate)
fertility_ab <- fertility_ab |>
mutate(data_id = paste0("line_", age_band),
tooltip = paste0("Year: ", year,
"\nAge Band: ", age_band,
"\nFertility: ", round(fertility_rate, 2)))
max_count <- max(agg_counts$count, na.rm = TRUE)
max_fertility <- max(fertility_ab$fertility_rate, na.rm = TRUE)
scale_factor <- max_count / max_fertility
# Build ggplot with interactive geoms
p <- ggplot() +
geom_col_interactive(
data = agg_counts,
aes(
x = year,
y = count,
fill = marital_status,
data_id = data_id,
tooltip = tooltip
),
position = position_dodge(width = 0.8),
alpha = 0.8
) +
geom_line_interactive(
data = fertility_ab,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
data_id = data_id,
tooltip = tooltip
),
size = 1
) +
geom_point_interactive(
data = fertility_ab,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
data_id = data_id,
tooltip = tooltip
),
size = 3
) +
scale_y_continuous(
name = "Female Population (thousands)",
labels = scales::comma,
sec.axis = sec_axis(~ . / scale_factor,
name = "Fertility Rate (per thousand females)",
labels = scales::label_number(accuracy = 0.1))
) +
scale_fill_brewer(palette = "Set1", name = "Marital Status") +
scale_colour_brewer(palette = "Set2", name = "Age Band") +
facet_wrap(~ labour_status, ncol = 1,
labeller = labeller(
labour_status = c(
labour_force = "In Labour Force",
outside_labour_force = "Outside Labour Force"
)
)) +
theme_minimal(base_size = 10) +
theme(
legend.position = "bottom",
strip.text = element_text(face = "bold", size = 12)
)
# Create interactive widget with zoom support enabled
gir <- girafe(
ggobj = p,
width_svg = 12,
height_svg = 8,
options = list(
opts_hover(css = "stroke-width:0.3px;opacity:2;"),
opts_hover_inv(css = "opacity:0.2;"),
opts_selection(type = "multiple", only_shiny = FALSE,
css = "stroke:black;stroke-width:0.3px;opacity:2;"),
opts_selection_inv(css = "opacity:0.2;"),
opts_zoom(max = 5, min = 1) # ← Added zoom control (1 = no zoom, up to 5x)
)
)
# Print the interactive plot
girfaceted ggiraph v2
Show Code
# Prepare aggregated counts with combined mutate and HTML tooltips
agg_counts <- final_dataset %>%
group_by(year, marital_status, labour_status) %>%
summarise(count = sum(count, na.rm = TRUE), .groups = "drop") %>%
mutate(
status = fct_inorder(paste(marital_status, labour_status, sep = " / ")),
data_id = paste0("year_", year),
tooltip = glue(
"<b>Year:</b> {year}<br/>",
"<b>Status:</b> {marital_status} / {labour_status}<br/>",
"<b>Count:</b> {comma(count)}"
)
)
# Prepare fertility rates with unified data_id and HTML tooltips
fertility_ab <- final_dataset %>%
filter(age_band != "All") %>%
distinct(year, age_band, fertility_rate) %>%
mutate(
data_id = paste0("year_", year),
tooltip = glue(
"<b>Year:</b> {year}<br/>",
"<b>Age Band:</b> {age_band}<br/>",
"<b>Fertility:</b> {round(fertility_rate, 2)}"
)
)
# Compute scaling factor once
max_count <- max(agg_counts$count, na.rm = TRUE)
max_fertility <- max(fertility_ab$fertility_rate, na.rm = TRUE)
scale_factor <- max_count / max_fertility
# Build interactive ggplot
p <- ggplot() +
geom_col_interactive(
data = agg_counts,
aes(
x = year,
y = count,
fill = marital_status,
data_id = data_id,
tooltip = tooltip
),
position = position_dodge(width = 0.8),
alpha = 0.8
) +
geom_line_interactive(
data = fertility_ab,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
data_id = data_id,
tooltip = tooltip
),
size = 1
) +
geom_point_interactive(
data = fertility_ab,
aes(
x = year,
y = fertility_rate * scale_factor,
colour = age_band,
group = age_band,
data_id = data_id,
tooltip = tooltip
),
size = 3
) +
scale_y_continuous(
name = "Female Population (thousands)",
labels = comma,
sec.axis = sec_axis(
~ . / scale_factor,
name = "Fertility Rate (per thousand females)",
labels = label_number(accuracy = 0.1)
)
) +
scale_fill_viridis_d(option = "C", name = "Marital Status") +
scale_colour_viridis_d(option = "B", name = "Age Band") +
facet_wrap(
~ labour_status,
ncol = 1,
labeller = labeller(
labour_status = c(
labour_force = "In Labour Force",
outside_labour_force = "Outside Labour Force"
)
)
) +
theme_minimal(base_size = 10) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
legend.position = "bottom",
strip.text = element_text(face = "bold", size = 12)
)
# Render interactive plot with refined options
gir <- girafe(
ggobj = p,
width_svg = 12,
height_svg= 8,
options = list(
opts_hover(css = "stroke-width:0.3px;opacity:2;"),
opts_hover_inv(css = "opacity:0.2;"),
opts_selection(
type = "single",
only_shiny = FALSE,
css = "stroke:black;stroke-width:0.3px;opacity:2;"
),
opts_selection_inv(css = "opacity:0.2;"),
opts_zoom(max = 5, min = 1),
opts_tooltip(
css = "background: rgba(0,0,0,0.75);
color: white;
padding: 5px;
border-radius: 5px;"
)
)
)
girSummary of Improvements:
Streamlined data prep: Combined mutate() calls and used glue::glue() for cleaner, HTML‐enhanced tooltips.
Cache chunk: Added cache=TRUE to avoid re‐computing on every render.
Color accessibility: Switched to Viridis palettes (scale_fill_viridis_d, scale_colour_viridis_d) for colorblind‐friendly visuals.
Unified selection IDs: Used a single data_id = paste0(“year_”, year) for both bars and lines—clicking a year now highlights all elements for that year.
Improved interactivity:
Changed to single‐click selection (opts_selection(type=“single”)).
Added custom tooltip styling via opts_tooltip().
Aesthetic tweaks: Rotated x‐axis labels for readability, bumped strip text to bold, and positioned legend at the bottom.
Performance optimization: Computed the scaling factor once outside of the plot call.
9 Key Findings & Insights
9.1 Summary Statistics
9.2 Statistical Significance Testing
10 Team Contributions
| Team Member | Tasks |
|---|---|
| Guo Zi Qiang Robin | |
| Chew Tze Han | |
| Cheong Wai Hong Jared | |
| Akram | |
| Gregory Tan |